Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace stop with return and provide and bmi_failure flags for EnergyModule error #108

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

SnowHydrology
Copy link
Contributor

@SnowHydrology SnowHydrology commented Apr 30, 2024

This PR replaces the stop statement with return in EnergyModule.f90. This will allow the NextGen framework to provide basin and timestep information when BMI_FAILURE is reported.

Changes:

Copy link
Contributor

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some changes may need to be made up the call stack from EnergyMain to avoid weird side effects and consequences.

As I read it, we're expecting a stack like this where the error occurs:

  • noahowp_update_until
  • noahowp_update
  • advance_in_time
  • solve_noahowp
  • EnergyMain

When we hit the early return from EnergyMain, there's still the following call to WaterMain in solve_noahowp that'll run, unless error checking and early returns are propagated.

@drakest123
Copy link

I agree with @PhilMiller that we need to immediately propagate an error up the call stack to avoid side effects of a late return from an error. I don’t see how to avoid conditional statements that check for errors before the end of a subroutine but most of the stop statements come from initialization checks at the beginning of a model run. In that case, conditional statements would have minimal impact on execution speed so the remaining consideration is with cluttering the code with conditional statements. Based on the example in PR #108, I’ve prototyped dealing with upstream error propagation. Logging functionality is included (in log_error() subroutine) such that write statements are contained within a single subroutine. log_error() would replace handle_err() and sys_abort() subroutines. Consistent with NGEN, existing preprocessor directives would control program flow (in driver/NoahModularDriver.f90) and logging status for non-fatal messages (in log_error() . Subroutines that include an error check would need to use ErrorCheckModule. ErrorCheckModule contains three public parameters:

  integer, parameter, public :: NOM_SUCCESS = 0
  integer, parameter, public :: NOM_FAILURE = 1
  integer, parameter, public :: NOM_MESSAGE = 2 

The values of NOM_SUCCESS and NOM_FAILURE are consistent with BMI_SUCCESS and BMI_FAILURE. Impacted files are:

% git status
On branch error_handling
Your branch is up to date with 'origin/error_handling'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   ../bmi/bmi_noahowp.f90
	modified:   ../driver/NoahModularDriver.f90
	modified:   EnergyModule.f90
	modified:   ErrorCheckModule.f90
	modified:   RunModule.f90

File additions/changes are:

% git diff
diff --git a/bmi/bmi_noahowp.f90 b/bmi/bmi_noahowp.f90
index d26aaef..3178acc 100644
--- a/bmi/bmi_noahowp.f90
+++ b/bmi/bmi_noahowp.f90
@@ -267,14 +267,14 @@ contains
   function noahowp_update(this) result (bmi_status)
     class (bmi_noahowp), intent(inout) :: this
     integer :: bmi_status
+    bmi_status = BMI_SUCCESS
 
     call advance_in_time(this%model)
-    if (this%model%domain%error_flag /= 0) then
+    if (this%model%domain%error_flag == BMI_FAILURE) then
       bmi_status = BMI_FAILURE
       return
-    else
-      bmi_status = BMI_SUCCESS
     end if
+
   end function noahowp_update
 
   ! Advance the model until the given time.
diff --git a/driver/NoahModularDriver.f90 b/driver/NoahModularDriver.f90
index a86ef93..f799713 100644
--- a/driver/NoahModularDriver.f90
+++ b/driver/NoahModularDriver.f90
@@ -43,6 +43,14 @@ program model_driver
   print*, "Running..."
   do while (current_time < end_time)
     status = m%update()                       ! run the model one time step
+    if (status == BMI_FAILURE) then
+#ifdef NGEN_ACTIVE
+      return status                           ! if NGEN
+#else
+      print*, "Stopping program."
+      stop
+#endif
+    end if
     status = m%get_current_time(current_time) ! update current_time
   end do
 
diff --git a/src/EnergyModule.f90 b/src/EnergyModule.f90
index 22c7577..4d0f44e 100644
--- a/src/EnergyModule.f90
+++ b/src/EnergyModule.f90
@@ -2,6 +2,7 @@ module EnergyModule
 
   use LevelsType
   use DomainType
+  use ErrorCheckModule
   use OptionsType
   use ParametersType
   use WaterType
@@ -45,6 +46,7 @@ contains
     REAL                                 :: D_RSURF  ! Reduced vapor diffusivity in soil for computing RSURF (SZ09)
     
     REAL                                 :: FIRE   !emitted IR (w/m2)
+    CHARACTER(len=256)                   :: error_string
     !---------------------------------------------------------------------
 
     ! Initialize the the fluxes from the vegetated fraction
@@ -299,13 +301,20 @@ contains
     END IF
 
     FIRE = forcing%LWDN + energy%FIRA
+    FIRE = -1     ! TESTING ERROR HANDLING
     IF(FIRE <=0.) THEN
-      domain%error_flag = 1
-      WRITE(*,*) 'emitted longwave <0; skin T may be wrong due to inconsistent'
-      WRITE(*,*) 'input of SHDFAC with LAI'
-      WRITE(*,*) domain%ILOC, domain%JLOC, 'SHDFAC=',parameters%FVEG,'parameters%VAI=',parameters%VAI,'TV=',energy%TV,'TG=',energy%TG
-      WRITE(*,*) 'LWDN=',forcing%LWDN,'energy%FIRA=',energy%FIRA,'water%SNOWH=',water%SNOWH
-      WRITE(*,*) 'Exiting ...'
+      domain%error_flag = NOM_FAILURE
+      !WRITE(*,*) 'emitted longwave <0; skin T may be wrong due to inconsistent'
+      !WRITE(*,*) 'input of SHDFAC with LAI'
+      !WRITE(*,*) domain%ILOC, domain%JLOC, 'SHDFAC=',parameters%FVEG,'parameters%VAI=',parameters%VAI,'TV=',energy%TV,'TG=',energy%TG
+      !WRITE(*,*) 'LWDN=',forcing%LWDN,'energy%FIRA=',energy%FIRA,'water%SNOWH=',water%SNOWH
+      !WRITE(*,*) 'Exiting ...'
+      write(error_string,101) 'EnergyMain: emitted longwave <0; skin T &
+       may be wrong due to inconsistent input of SHDFAC with LAI. ILOC=', &
+       domain%ILOC, ', JLOC=',domain%JLOC, ', SHDFAC=',parameters%FVEG,', &
+       parameters%VAI=',parameters%VAI,', TV=',energy%TV,', TG=',energy%TG
+101   format(A,I10,A,I10,A,F8.3,A,F8.3,A,F8.3,A,F8.3)
+      call log_message(domain%error_flag, trim(error_string))
       RETURN
     END IF
 
diff --git a/src/ErrorCheckModule.f90 b/src/ErrorCheckModule.f90
index f330696..33ef516 100644
--- a/src/ErrorCheckModule.f90
+++ b/src/ErrorCheckModule.f90
@@ -1,12 +1,15 @@
 module ErrorCheckModule
 
-  ! General error checking routins
-
+  ! General error checking routines
   implicit none
+  integer, parameter, public :: NOM_SUCCESS = 0
+  integer, parameter, public :: NOM_FAILURE = 1
+  integer, parameter, public :: NOM_MESSAGE = 2
 
   private
   public:: sys_abort
   public:: is_within_bound
+  public:: log_message
 
   interface is_within_bound
     module procedure is_within_bound_int
@@ -32,6 +35,29 @@ contains
 
   end subroutine sys_abort
 
+  subroutine log_message(err, message)
+
+    ! log information, typically an error
+
+    implicit none
+
+    integer, intent(in) :: err                  ! error code
+    character(*), intent(in) :: message         ! message
+
+    ! If error, write the error. If message, write message unless NGEN_QUIET
+    if(err==NOM_FAILURE)then
+      write(*, '(A,I2,A)') ' Error Code: ', err, ',  Message: '//trim(message)
+      call flush(6)
+    endif
+#ifndef NGEN_QUIET
+    if(err==NOM_MESSAGE)then
+      write(*, '(A,I2,A)') ' Error Code: ', err, ',  Message: '//trim(message)
+      call flush(6)
+    endif
+#endif
+
+  end subroutine log_message
+
   function is_within_bound_int(var, lower_bound, upper_bound) result(withinbound)
 
     ! check if a integer value is within specified bounds
diff --git a/src/RunModule.f90 b/src/RunModule.f90
index cb357b8..fdcb20c 100644
--- a/src/RunModule.f90
+++ b/src/RunModule.f90
@@ -18,8 +18,10 @@ module RunModule
   use EnergyModule
   use WaterModule
   use DateTimeUtilsModule
+  use ErrorCheckModule
   
   implicit none
+
   type :: noahowp_type
     type(namelist_type)   :: namelist
     type(levels_type)     :: levels
@@ -246,6 +248,9 @@ contains
     type (noahowp_type), intent (inout) :: model
 
     call solve_noahowp(model)
+    if (model%domain%error_flag == NOM_FAILURE) then
+      return
+    end if
 
     model%domain%itime    = model%domain%itime + 1 ! increment the integer time by 1
     model%domain%time_dbl = dble(model%domain%time_dbl + model%domain%dt) ! increment model time in seconds by DT
@@ -308,6 +313,9 @@ contains
     !---------------------------------------------------------------------
 
     call EnergyMain (domain, levels, options, parameters, forcing, energy, water)
+    if (domain%error_flag == NOM_FAILURE) then
+      return
+    end if
 
     !---------------------------------------------------------------------
     ! call the main water routines (canopy + snow + soil water components)

@SnowHydrology SnowHydrology marked this pull request as draft May 14, 2024 19:10
@GreyREvenson
Copy link
Contributor

GreyREvenson commented May 22, 2024

Offering my thoughts for consideration, @SnowHydrology

First, on the overall approach, the use of an error flag variable seems appropriate (e.g., domain%error_flag). I failed to find a superior solution and am not confident that one will be found (see Does exception handling exist in Fortran?). Unfortunately, it does seem like we need to add flag checks (e.g., if (domain%error_flag .eq. NOM_FAILURE) return -- the one line syntax could help decrease clutter) immediately after any call to a subroutine within which the error flag could be set. This is a lot of clutter, as @drakest123 noted, but would be necessary to satisfy @PhilMiller's stipulation that the early return be immediately propagated up the stack. I don't know how else to do it.

Second, regarding the error message, I'd advocate for saving the error message and printing when NGEN would print the time step and location information for the error (if possible). Instead, this code (to my current understanding) calls log_message (in ErrorCheckModule) to print the error message when the error is encountered.

Third, @drakest123, I'm not sure if you meant to add your commits atop those from @SnowHydrology in his error_handling branch but I'm noting that here as it caused some initial confusion on my part.

@@ -31,6 +32,8 @@ module DomainType
integer :: croptype ! crop type
integer :: isltyp ! soil type
integer :: IST ! surface type 1-soil; 2-lake
integer :: error_flag ! flag for energy balance error (0 = no error, 1 = longwave < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use ErrorCheckModule to deal with the error messages, we may want to move the error_flag flag to ErrorCheckModule instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a possible solution although I'd need to also review how domain%error_flag is used apart from how we are using it for the current set of updates. Another possibility is to bind a new error object to the model% object and dissociate error_flag from domain%.


implicit none

contains

subroutine open_forcing_file(filename)
subroutine open_forcing_file(filename, error_flag)
Copy link
Contributor

@GreyREvenson GreyREvenson May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than explicitly passing error_flag, could you just make error_flag a public variable in ErrorCheckModule and set it in anywhere that uses ErrorCheckModule?

Along those same lines, could you make a public character string in ErrorCheckModule and write any error message to that variable when the error is encountered? Then use ErrorCheckModule in bmi_noahowp.f90 to be able to print the error message there (when the error code has already been passed up the stack)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be possible to remove error_flag from some of the intermediate call signatures. That would clean up some of the call signatures at the expense of requiring programming care not to co-opt that variable for other uses.

Passing the error string up the stack would require adding an error string to the domain object (as currently constructed). I agree that would be more complete. That is Keith's call. If we do this, then it would make sense to follow through with your first suggestion about removing error_flag from some call signatures. When passing these to two variables up to BMI, we would need to either pass a single error object or pass both the error code and error string. I'd prefer to avoid binding ErrorCheckModule to BMI.

stop ": ERROR EXIT"
error_flag = NOM_FAILURE
write(error_string,'(A,A,A)') "AsciiReadModule.f90:open_forcing_file(): File: ",trim(filename), " does not exist. Check the forcing file specified as a command-line argument"
call log_message(error_flag, error_string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment regarding writing to a possible public character variable in ErrorCheckModule instead of calling log_message to print right away.

Copy link
Contributor

@GreyREvenson GreyREvenson May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, seems like kind of a bummer that we'd have to manually record where in the program the error occurs via the error message. My understanding is that this is because we're going to use return statements instead of stop statements in which case the program won't record any locational information (i.e., where in the code the error happens). Is there any way to automatically record locational information to pass along side the error code and message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code is being run through the C pre-processor (as use of #ifdef NGEN_ACTIVE suggests), then we could define a macro along the lines of

#define NOAH_OWP_MODULAR_REPORT_ERROR(error_string) \
    error_flag = NOM_FAILURE \
    call log_error_message(error_flag, error_string, __FILE__, __LINE__)

Those __FILE__ and __LINE__ macros would capture the location of the error report, though not the function name

I'm a touch concerned that these files might not get pre-processed as expected, since that usually seems to go with a .F90 (upper-case) file extension. As I understand it, though, that's just a convention, and the build scripts could do whatever. The shift to a cmake-driven build system may change that, though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered the preprocessor approach that @PhilMiller discusses. Besides the issues with this approach that Phil brings up, I don't think that using preprocessor directives for delineating error sources would be appropriate. Currently, the program has sparse reliance on preprocessor directives and they are used for major functional conditions such as whether the code is run standalone or as a module. These few preprocessor directives could easily be removed so the reliance on them is easy to decouple.

One possible intermediate improvement would be to define a string that contains the subroutine name and another string with module scope that contains the module name. Then, the error string would refer to those variables instead of them being hardwired. Given that LINE would unambiguously identify the line where the error is reported, unique error messages alternatively identify error sources although the onus is more on the programmer.

@drakest123
Copy link

drakest123 commented Jun 15, 2024

Updated error handling, addressing some of Grey Evenson's suggestions. Ready for review by @SnowHydrology. Changes include:

  • add ErrorType to contain error flag and error message in the BMI context. ErrorType is instantiated in RunModule before other objects so that it is available to be populated with error information early in the initialization process. This new file necessitated changes to several Makefiles.
  • removed error_flag as a variable in DomainType
  • removed error_flag from many subroutine call signatures, instead relying on USE ErrorCheckModule to pass error information
  • in RunModule, error_flag and error_string are passed from ErrorCheckModule to ErrorType. The save_error_state() method decouples the model code from BMI.
  • added moduleName as a private constant string that contains the name of the module that produced an error
  • added subroutineName as a local constant string that contains the name of the subroutine that produced an error
  • note that ErrorCheckModule:log_message() contains error_flag and error_string even though these variables are implicitly passed as members of the module. Explicitly passing these variables allows for passing alternate values in a context that does not explicitly support function overloading.
  • all of the error checks that were touched by the working code were testing in an intermediate state. Many of the error checks were tested in the final state. Error checks that were not touched by the working program (inactive code) were not tested.
  • currently, error_flag is returned to a calling program (in the form of bmi_status). error_string is available in the ErrorType object if, in the future, the calling program requests the error message in the subroutine signature.

@drakest123 drakest123 marked this pull request as ready for review June 15, 2024 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants